Creating Structured PDF Files

نویسندگان

  • Matthew R. B. Hardy
  • David F. Brailsford
  • Peter L. Thomas
چکیده

This paper describes a tool for recombining the logical structure from an XML document with the typeset appearance of the corresponding PDF document. The tool uses the XML representation as a template for the insertion of the logical structure into the existing PDF document, thereby creating a Structured/Tagged PDF. The addition of logical structure adds value to the PDF in three ways: the accessibility is improved (PDF screen readers for visually impaired users perform better), media options are enhanced (the ability to reflow PDF documents, using structure as a guide, makes PDF viable for use on hand-held devices) and the re-usability of the PDF documents benefits greatly from the presence of an XML-like structure tree to guide the process of text retrieval in reading order (e.g. when interfacing to XML applications and databases).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new approach to covert communication via PDF files

A new covert communication method via PDF files is proposed. A secret message, after being encoded by a special ASCII code and embedded at between-word and betweencharacter locations in the text of a PDF file, becomes invisible in the window of a common PDF reader, creating a steganographic effect for secret transmission through the PDF file. Experimental results show the feasibility of the pro...

متن کامل

PDF/A standard for long term archiving

PDF/A is defined by ISO 19005-1 as a file format based on PDF format. The standard provides a mechanism for representing electronic documents in a way that preserves their visual appearance over time, independent of the tools and systems used for creating or storing the files.

متن کامل

pdf2table: A Method to Extract Table Information from PDF Files

Tables are a common structuring element in many documents, such as PDF files. To reuse such tables, appropriate methods need to be develop, which capture the structure and the content information. We have developed several heuristics which together recognize and decompose tables in PDF files and store the extracted data in a structured data format (XML) for easier reuse. Additionally, we implem...

متن کامل

A System for Converting PDF Documents into Structured XML Format

We present in this paper a system for converting PDF legacy documents into structured XML format. This conversion system first extracts the different streams contained in PDF files (text, bitmap and vectorial images) and then applies different components in order to express in XML the logically structured documents. Some of these components are traditional in Document Analysis, other more speci...

متن کامل

Relational calculus pdf

Algebra: specifying how to obtain results. SQL: specifying real estate principles a value approach pdf how to derive.Tuple Relational Calculus TRC. Query specification involves giving a step by step process of obtaining the query.Comp 521 Files and Databases. tuple relational calculus pdf Comes in two flavors: Tuple relational calculus TRC and Domain relational calculus.y Comes in two flavours:...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004